More Projects




Model Behavior Video

I appeared in a video series for Uptake about how modeling industrial failures work - this was really fun, and I hope people will give it a look! I talk about a specific failure of diesel locomotive hardware, and how I solved it while I was working at Uptake.


Radlibs! in R or in Python

I wrote a silly R package called radlibs that allows you to make your own madlibs. Then I wrote a version in Python. Then I added them to CRAN and pypi. Data science doesn’t always have to be serious. Use install.packages("radlibs") or pip install radlibs to get these packages. Issues and feedback welcome!

Keywords: python, r, packages


Evaluation of R Forwards Package Workshop

I recently co-taught a daylong course for a group of 30 women/gender nonbinary students about how to write R packages- we had a really good time! I analyzed our pre- and post- surveys in a notebook, to check how effective the day was for students.

Keywords: r, data visualization




Tutorial: Fun with Real Estate Data

This project is a kaggle kernel, in which I walked the reader through the process of cleaning and modeling the data from a real estate prices dataset, using linear modeling, random forests, and gradient boosting (xgboost). My most popular kernel to date! This one also produced respectable competition results, and was chosen for special recognition by the Kaggle admins. (I won a mug!)

Update: Read the interview I did regarding this project (and the other fabulous winners)! http://blog.kaggle.com/2017/03/29/predicting-house-prices-playground-competition-winning-kernels

Keywords: machine learning, data cleaning




Data for Democracy 2017 Hackathon

I led a team working on the Chicago Lobbying project, which produced some great output, including this visualization of lobbying and aldermen in Chicago. The project is continuing and building out new functionality. I personally cleaned some of the data underlying, but my biggest contribution was organizing, planning, and leadership. Additional results: https://data.world/lilianhj/chicago-lobbyists

Update: Check out a case study by the fine folks at data.world discussing the work that went in to this project: https://medium.com/@sharonbrener/dbf30aeee70b

Update: I have turned off this application for the time being, but if you are interested in it contact me!




Exploring Austin, Texas Crime

Among the public datasets available on Kaggle is this one, describing the crimes that have occurred in Austin, TX over a couple of years. This project cleans the data, does some exploratory analysis, and maps various kinds of crime by district

Key Skills: data cleaning, GIS



Interruptions at the First Presidential Debate

My first natural language processing/text mining! This was a lot of fun, because I watched the debate and then was able to examine how well my actual perceptions matched what the data told me.

Key Skills: NLP, data cleaning




Medical No-Shows in a Brazilian Hospital

What features of patients help providers predict who is at risk of not showing up to appointments? This one provides insights that could be used by the actual hospital that is the source of the data that can be used to improve their patient care.

Key Skills: data cleaning, modeling, data visualization, machine learning




Dental Care in the ACA Marketplace

I think this is a good kernel, but it never got traction because the data was not glamorous and the results were not very cheerful. In short, the dental coverage from the ACA is seriously inadequate for population needs, unfortunately.

Key Skills: data cleaning, GIS




Tidy Text Mining in Facebook Posts

In this project, I used a provided dataset of facebook posts from a community group and analyzed a few details about the content- specifically, how sentiment and gender related to “likes” on the posts.

Key Skills: NLP, data cleaning, data visualization